knn retrieval
Efficient k-Nearest-Neighbor Machine Translation with Dynamic Retrieval
Gao, Yan, Cao, Zhiwei, Miao, Zhongjian, Yang, Baosong, Liu, Shiyu, Zhang, Min, Su, Jinsong
To achieve non-parametric NMT domain adaptation, $k$-Nearest-Neighbor Machine Translation ($k$NN-MT) constructs an external datastore to store domain-specific translation knowledge, which derives a $k$NN distribution to interpolate the prediction distribution of the NMT model via a linear interpolation coefficient $\lambda$. Despite its success, $k$NN retrieval at each timestep leads to substantial time overhead. To address this issue, dominant studies resort to $k$NN-MT with adaptive retrieval ($k$NN-MT-AR), which dynamically estimates $\lambda$ and skips $k$NN retrieval if $\lambda$ is less than a fixed threshold. Unfortunately, $k$NN-MT-AR does not yield satisfactory results. In this paper, we first conduct a preliminary study to reveal two key limitations of $k$NN-MT-AR: 1) the optimization gap leads to inaccurate estimation of $\lambda$ for determining $k$NN retrieval skipping, and 2) using a fixed threshold fails to accommodate the dynamic demands for $k$NN retrieval at different timesteps. To mitigate these limitations, we then propose $k$NN-MT with dynamic retrieval ($k$NN-MT-DR) that significantly extends vanilla $k$NN-MT in two aspects. Firstly, we equip $k$NN-MT with a MLP-based classifier for determining whether to skip $k$NN retrieval at each timestep. Particularly, we explore several carefully-designed scalar features to fully exert the potential of the classifier. Secondly, we propose a timestep-aware threshold adjustment method to dynamically generate the threshold, which further improves the efficiency of our model. Experimental results on the widely-used datasets demonstrate the effectiveness and generality of our model.\footnote{Our code is available at \url{https://github.com/DeepLearnXMU/knn-mt-dr}.
Federated Nearest Neighbor Machine Translation
Du, Yichao, Zhang, Zhirui, Wu, Bingzhe, Liu, Lemao, Xu, Tong, Chen, Enhong
To protect user privacy and meet legal regulations, federated learning (FL) is attracting significant attention. Training neural machine translation (NMT) models with traditional FL algorithms (e.g., FedAvg) typically relies on multi-round model-based interactions. However, it is impractical and inefficient for translation tasks due to the vast communication overheads and heavy synchronization. In this paper, we propose a novel Federated Nearest Neighbor (FedNN) machine translation framework that, instead of multi-round model-based interactions, leverages one-round memorization-based interaction to share knowledge across different clients and build low-overhead privacy-preserving systems. The whole approach equips the public NMT model trained on large-scale accessible data with a k-nearestneighbor (kNN) classifier and integrates the external datastore constructed by private text data from all clients to form the final FL model. A two-phase datastore encryption strategy is introduced to achieve privacy-preserving during this process. Extensive experiments show that FedNN significantly reduces computational and communication costs compared with FedAvg, while maintaining promising translation performance in different FL settings. In recent years, neural machine translation (NMT) has significantly improved translation quality (Bahdanau et al., 2015; Vaswani et al., 2017; Hassan et al., 2018) and has been widely adopted in many commercial systems. The current mainstream system is first built on a large-scale corpus collected by the service provider and then directly applied to translation tasks for different users and enterprises. However, this application paradigm faces two critical challenges in practice.
Simple and Scalable Nearest Neighbor Machine Translation
Dai, Yuhan, Zhang, Zhirui, Liu, Qiuzhi, Cui, Qu, Li, Weihua, Du, Yichao, Xu, Tong
Despite being conceptually attractive, kNN-MT is burdened with massive storage requirements and high computational complexity since it conducts nearest neighbor searches over the entire reference corpus. In this paper, we propose a simple and scalable nearest neighbor machine translation framework to drastically promote the decoding and storage efficiency of kNN-based models while maintaining the translation performance. To this end, we dynamically construct an extremely small datastore for each input via sentence-level retrieval to avoid searching the entire datastore in vanilla kNN-MT, based on which we further introduce a distance-aware adapter to adaptively incorporate the kNN retrieval results into the pre-trained NMT models. Experiments on machine translation in two general settings, static domain adaptation, and online learning, demonstrate that our proposed approach not only achieves almost 90% speed as the NMT model without performance degradation, but also significantly reduces the storage requirements of kNN-MT. Domain adaptation is one of the fundamental challenges in machine learning which aspires to cope with the discrepancy across domain distributions and improve the generality of the trained models. It has attracted wide attention in the neural machine translation (NMT) area (Britz et al., 2017; Chen et al., 2017; Chu & Wang, 2018; Bapna & Firat, 2019; Bapna et al., 2019; Wei et al., 2020). Recently, kNN-MT and its variants (Khandelwal et al., 2021; Zheng et al., 2021a;b; Wang et al., 2022a) provide a new paradigm and have achieved remarkable performance for fast domain adaptation by retrieval pipelines. These approaches combine traditional NMT models (Bahdanau et al., 2015; Vaswani et al., 2017) with a token-level k-nearest-neighbour (kNN) retrieval mechanism, allowing it to directly access the domain-specific datastore to improve translation accuracy without fine-tuning the entire model.
Improving Few-Shot Performance of Language Models via Nearest Neighbor Calibration
Nie, Feng, Chen, Meixi, Zhang, Zhirui, Cheng, Xu
Pre-trained language models (PLMs) have exhibited remarkable few-shot learning capabilities when provided a few examples in a natural language prompt as demonstrations of test instances, i.e., in-context learning. However, the performance of in-context learning is susceptible to the choice of prompt format, training examples and the ordering of the training examples. In this paper, we propose a novel nearest-neighbor calibration framework for in-context learning to ease this issue. It is inspired by a phenomenon that the in-context learning paradigm produces incorrect labels when inferring training instances, which provides a useful supervised signal to calibrate predictions. Thus, our method directly augments the predictions with a $k$-nearest-neighbor ($k$NN) classifier over a datastore of cached few-shot instance representations obtained by PLMs and their corresponding labels. Then adaptive neighbor selection and feature regularization modules are introduced to make full use of a few support instances to reduce the $k$NN retrieval noise. Experiments on various few-shot text classification tasks demonstrate that our method significantly improves in-context learning, while even achieving comparable performance with state-of-the-art tuning-based approaches in some sentiment analysis tasks.
Non-Parametric Domain Adaptation for End-to-End Speech Translation
Du, Yichao, Wang, Weizhi, Zhang, Zhirui, Chen, Boxing, Xu, Tong, Xie, Jun, Chen, Enhong
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters. However, the effectiveness of neural-based approaches to this task is severely limited by the available training corpus, especially for domain adaptation where in-domain triplet training data is scarce or nonexistent. In this paper, we propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system. To this end, we first incorporate an additional encoder into the pre-trained E2E-ST model to realize text translation modelling, and then unify the decoder's output representation for text and speech translation tasks by reducing the correspondent representation mismatch in available triplet training data. During domain adaptation, a k-nearest-neighbor (kNN) classifier is introduced to produce the final translation distribution using the external datastore built by the domain-specific text translation corpus, while the universal output representation is adopted to perform a similarity search. Experiments on the Europarl-ST benchmark demonstrate that when in-domain text translation data is involved only, our proposed approach significantly improves baseline by 12.82 BLEU on average in all translation directions, even outperforming the strong in-domain fine-tuning method.
Approaching Adaptation Guided Retrieval in Case-Based Reasoning through Inference in Undirected Graphical Models
In Case-Based Reasoning, when the similarity assumption does not hold, the retrieval of a set of cases structurally similar to the query does not guarantee to get a reusable or revisable solution. Knowledge about the adaptability of solutions has to be exploited, in order to define a method for adaptation-guided retrieval. We propose a novel approach to address this problem, where knowledge about the adaptability of the solutions is captured inside a metric Markov Random Field (MRF). Nodes of the MRF represent cases and edges connect nodes whose solutions are close in the solution space. States of the nodes represent different adaptation levels with respect to the potential query. Metric-based potentials enforce connected nodes to share the same state, since cases having similar solutions should have the same adaptability level with respect to the query. The main goal is to enlarge the set of potentially adaptable cases that are retrieved without significantly sacrificing the precision and accuracy of retrieval. We will report on some experiments concerning a retrieval architecture where a simple kNN retrieval (on the problem description) is followed by a further retrieval step based on MRF inference.
Exploiting Markov Random Fields to Enhance Retrieval in Case-Based Reasoning
Portinale, Luigi (Universitá del Piemonte Orientale)
The similarity assumption in Case-Based Reasoning (similar problems have similar solutions) has been questioned by several researchers. If knowledge about the adaptability of solutions is available, it can be exploited in order to guide retrieval. Several approaches have been proposed in this context, often assuming a similarity or cost measure defined over the solution space. In this paper, we propose a novel approach where the adaptability of the solutions is captured inside a metric Markov Random Field (MRF). Each case is represented as a node in the MRF, and edges connect cases whose solutions are close in the solution space. States of the nodes represent the adaptability effort with respect to the query. Potentals are defined to enforce connected nodes to share the same state; this models the fact that cases having similar solutions should have the same adaptability effort with respect to the query. The main goal is to enlarge the set of potentially adaptable cases that are retrieved (the recall) without significantly sacrificing the precision of retrieval. We will report on some experiments concerning a retrieval architecture where a simple kNN retrieval is followed by a further retrieval step based on MRF inference.